1. Read in the gapminder_clean.csv data as a tibble using read_csv.
df = as_tibble(read.csv("gapminder_clean.csv", check.names=FALSE)[,-1])

head(df)
## # A tibble: 6 x 19
##   `Country Name`  Year `Agriculture, value a~` `CO2 emissions~` `Domestic cred~`
##   <chr>          <int>                   <dbl>            <dbl>            <dbl>
## 1 Afghanistan     1962                      NA           0.0738            21.3 
## 2 Afghanistan     1967                      NA           0.124              9.92
## 3 Afghanistan     1972                      NA           0.131             18.9 
## 4 Afghanistan     1977                      NA           0.183             13.8 
## 5 Afghanistan     1982                      NA           0.166             NA   
## 6 Afghanistan     1987                      NA           0.276             NA   
## # ... with 14 more variables:
## #   `Electric power consumption (kWh per capita)` <dbl>,
## #   `Energy use (kg of oil equivalent per capita)` <dbl>,
## #   `Exports of goods and services (% of GDP)` <dbl>,
## #   `Fertility rate, total (births per woman)` <dbl>,
## #   `GDP growth (annual %)` <dbl>,
## #   `Imports of goods and services (% of GDP)` <dbl>, ...
  1. Filter the data to include only rows where Year is 1962 and then make a scatter plot comparing 'CO2 emissions (metric tons per capita)' and gdpPercap for the filtered data.
df_1962 = df %>% filter(Year == 1962) 

p = df_1962 %>% filter(!is.na(gdpPercap) & !is.na(`CO2 emissions (metric tons per capita)`)) %>% 
  ggplot(aes(x = gdpPercap, y = `CO2 emissions (metric tons per capita)`, text=`Country Name`)) + 
  geom_point() +  labs(title="CO2 emissions vs GDP") + theme_bw() + scale_y_log10() + scale_x_log10() + labs(x="GPD Per Capita")
  
ggplotly(p)
  1. On the filtered data, calculate the correlation of 'CO2 emissions (metric tons per capita)' and gdpPercap. What is the correlation and associated p value?
cor.test(df_1962$gdpPercap, df_1962$`CO2 emissions (metric tons per capita)`, use = "complete.obs")
## 
##  Pearson's product-moment correlation
## 
## data:  df_1962$gdpPercap and df_1962$`CO2 emissions (metric tons per capita)`
## t = 25.269, df = 106, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8934697 0.9489792
## sample estimates:
##       cor 
## 0.9260817
  1. On the unfiltered data, answer “In what year is the correlation between 'CO2 emissions (metric tons per capita)' and gdpPercap the strongest?” Filter the dataset to that year for the next step…
df %>%
  group_by(Year) %>%
  summarize(Correlation=cor(gdpPercap, `CO2 emissions (metric tons per capita)`,use = "complete.obs")) %>%
  arrange(desc(Correlation))
## # A tibble: 10 x 2
##     Year Correlation
##    <int>       <dbl>
##  1  1967       0.939
##  2  1962       0.926
##  3  1972       0.843
##  4  1982       0.817
##  5  1987       0.810
##  6  1992       0.809
##  7  1997       0.808
##  8  2002       0.801
##  9  1977       0.793
## 10  2007       0.720
  1. Using plotly, create an interactive scatter plot comparing 'CO2 emissions (metric tons per capita)' and gdpPercap, where the point size is determined by pop (population) and the color is determined by the continent. You can easily convert any ggplot plot to a plotly plot using the ggplotly() command.
df_1967 = df %>% filter(Year == 1967) 

p = df_1967 %>% filter(!is.na(gdpPercap) & !is.na(`CO2 emissions (metric tons per capita)`))  %>% 
  ggplot(aes(x = gdpPercap, y = `CO2 emissions (metric tons per capita)`, text=`Country Name`, col=continent, size=pop)) + 
  geom_point() + labs(title="CO2 emissions vs GDP") + theme_bw() + scale_y_log10() + scale_x_log10() + labs(x="GPD Per Capita")
  
ggplotly(p)